[wip] one less allocation/copy in Pipe #57

casteryh · 2025-10-13T17:52:56Z

No description provided.

LucasLLC

I'm open to changing the transport buffer interface, but the burden of this PR is to:

ensure resharding tests don't break (and that we don't drop support for contiguous support)
prove this is better
prove that this is faster

Arguably the interface is too biased towards rdma buffer, but any new interface needs to be generically supported against all backends (incl gloo in flight).

If your goal is to make rdma buffer faster, can you use test_models.py and test whether this is faster?

LucasLLC · 2025-10-13T21:10:15Z

torchstore/transport/pipe.py

-            await self.storage_volume.get.call_one(
-                key, transport_buffer, request.meta_only()
-            )
+        transport_buffer = await self.storage_volume.get.call_one(


You're creating a race condition here -- memory is often created on the fly in storage volume to deal with non-contiguous tensors.

In storage volume, all the tensors are already contiguous, and it's just handing out RDMABuffers pointing to those tensors.

codecov-commenter · 2025-10-13T21:52:57Z

Codecov Report

❌ Patch coverage is 70.96774% with 9 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@871fa11). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
torchstore/transport/buffers.py	58.82%	7 Missing ⚠️
torchstore/storage_volume.py	88.88%	1 Missing ⚠️
torchstore/transport/pipe.py	80.00%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main      #57   +/-   ##
=======================================
  Coverage        ?   61.01%           
=======================================
  Files           ?       22           
  Lines           ?     1698           
  Branches        ?        0           
=======================================
  Hits            ?     1036           
  Misses          ?      662           
  Partials        ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

casteryh · 2025-10-13T22:35:30Z

I'm open to changing the transport buffer interface, but the burden of this PR is to:

ensure resharding tests don't break (and that we don't drop support for contiguous support)

~~The test itself seems broken for me (it's either extremely slow or it hangs, been waiting for it for 10 minutes).~~
Yes it passed, it was indeed just slow, took 20 minutes to complete.
Ran integration tests in forge and it was passing for qwen 8b, trainer fsdp=2, policy tp=2.

prove this is better

I think TransportBuffer shouldn't need to allocate anything and shouldn't own anything. It's easier to reason about it if we simply treat it as a "remote reference" to a tensor of sorts.

prove that this is faster

Will need more evidence but see below for result with test_models.py

Arguably the interface is too biased towards rdma buffer, but any new interface needs to be generically supported against all backends (incl gloo in flight).

I agree, for example, with gloo, we can probably make non-contiguous tensor works without extra allocation. I think we can always change the interface later to add something less restrictive than from_contiguous_tensor.

If your goal is to make rdma buffer faster, can you use test_models.py and test whether this is faster?

Yes, on slurm single node. Put went from 7 seconds to 4.3 seconds. I will run 32B in forge to double check.
before: https://www.internalfb.com/phabricator/paste/view/P1991001240

[0] rank: 0 pushed state dict in 7.046610719058663 seconds
[0] rank: 0 got state dict in 4.545355102978647 seconds

after: https://www.internalfb.com/phabricator/paste/view/P1990993018

[0] rank: 0 pushed state dict in 4.3794964698608965 seconds
[0] rank: 0 got state dict in 4.107833093032241 seconds

casteryh · 2025-10-13T23:53:01Z

From forge e2e run on slurm 32b multi-node:
without patch:

STEP 1
  policy_worker_perf/update_weights/total_duration_avg_s: 78.15236387704499
  policy_worker_perf/update_weights/total_duration_max_s: 83.50384995015338
  rl_trainer_perf/push_weights/total_duration_avg_s: 9.24261197517626
  rl_trainer_perf/push_weights/total_duration_max_s: 10.174692547880113
STEP 2
  policy_worker_perf/update_weights/total_duration_avg_s: 76.40055759515963
  policy_worker_perf/update_weights/total_duration_max_s: 82.62615168001503
  rl_trainer_perf/push_weights/total_duration_avg_s: 7.3707064733607695
  rl_trainer_perf/push_weights/total_duration_max_s: 8.557962979190052
STEP 3
  policy_worker_perf/update_weights/total_duration_avg_s: 76.61371284359484
  policy_worker_perf/update_weights/total_duration_max_s: 82.17285228613764
  rl_trainer_perf/push_weights/total_duration_avg_s: 7.345048399467487
  rl_trainer_perf/push_weights/total_duration_max_s: 8.0563137922436
STEP 4
  policy_worker_perf/update_weights/total_duration_avg_s: 76.53273953814642
  policy_worker_perf/update_weights/total_duration_max_s: 81.62242694292217
  rl_trainer_perf/push_weights/total_duration_avg_s: 7.770320081850514
  rl_trainer_perf/push_weights/total_duration_max_s: 8.817566874902695

with patch:

STEP 1
  policy_worker_perf/update_weights/total_duration_avg_s: 62.47022197701153
  policy_worker_perf/update_weights/total_duration_max_s: 69.53650338202715
  rl_trainer_perf/push_weights/total_duration_avg_s: 9.41419801331358
  rl_trainer_perf/push_weights/total_duration_max_s: 10.834439367055893
STEP 2
  policy_worker_perf/update_weights/total_duration_avg_s: 61.52622404671274
  policy_worker_perf/update_weights/total_duration_max_s: 70.85746217798442
  rl_trainer_perf/push_weights/total_duration_avg_s: 8.029894174134824
  rl_trainer_perf/push_weights/total_duration_max_s: 9.306584176141769
STEP 3
  policy_worker_perf/update_weights/total_duration_avg_s: 62.209209970140364
  policy_worker_perf/update_weights/total_duration_max_s: 72.19040106609464
  rl_trainer_perf/push_weights/total_duration_avg_s: 7.521832116821315
  rl_trainer_perf/push_weights/total_duration_max_s: 8.884237607009709
STEP 4
  policy_worker_perf/update_weights/total_duration_avg_s: 62.030980555253336
  policy_worker_perf/update_weights/total_duration_max_s: 72.34240108402446
  rl_trainer_perf/push_weights/total_duration_avg_s: 7.057374230294954
  rl_trainer_perf/push_weights/total_duration_max_s: 7.307658496778458

casteryh · 2025-10-14T17:12:03Z

ptal @LucasLLC

casteryh added 4 commits October 10, 2025 07:06

enable rdma by default

99016f6

get

37c02ba

fix metadata

86fbaea

fix

e73de6d

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 13, 2025

LucasLLC requested changes Oct 13, 2025

View reviewed changes

LucasLLC reviewed Oct 13, 2025

View reviewed changes

casteryh added 3 commits October 13, 2025 14:36

fix put for obj

6c3412a

fix bug in from_contiguous_tensor

255e89c

fix

1733b0a

clean up

7ff69b7

casteryh added 2 commits October 13, 2025 16:00

Merge branch 'main' into yhu/pipe-optimization

17d11b9

move to cpu first

df6335e

casteryh added 2 commits October 18, 2025 14:12

drop

ba294de

await

f43bc8f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[wip] one less allocation/copy in Pipe #57

[wip] one less allocation/copy in Pipe #57

Uh oh!

casteryh commented Oct 13, 2025

Uh oh!

LucasLLC left a comment

Uh oh!

LucasLLC Oct 13, 2025

Uh oh!

casteryh Oct 14, 2025

Uh oh!

codecov-commenter commented Oct 13, 2025

Uh oh!

casteryh commented Oct 13, 2025 •

edited

Loading

Uh oh!

casteryh commented Oct 13, 2025

Uh oh!

casteryh commented Oct 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[wip] one less allocation/copy in Pipe #57

Are you sure you want to change the base?

[wip] one less allocation/copy in Pipe #57

Uh oh!

Conversation

casteryh commented Oct 13, 2025

Uh oh!

LucasLLC left a comment

Choose a reason for hiding this comment

Uh oh!

LucasLLC Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

casteryh Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Oct 13, 2025

Codecov Report

Uh oh!

casteryh commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

casteryh commented Oct 13, 2025

Uh oh!

casteryh commented Oct 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

casteryh commented Oct 13, 2025 •

edited

Loading